Perspect Ive Why We Need More Algal Genomes
نویسندگان
چکیده
In the current post-genomics world, a relevant question on the minds of many phycologists might be: do we really need more algal genomes or, should we stop and focus on the hard job of developing genetic tools and other resources for already sequenced taxa? This question has, in our opinion, a clear answer: we need to do both. Here, we focus on the genome sequencing side and discuss the following reasons why we think algal (and related heterotrophic protist) genome sequencing should remain a focus of phycological research: (1) transcriptomes that aim to create gene inventories or study gene expression differences (primarily Illumina RNAseq data), although cheap to produce and relatively easy to analyze, may not be sufficient for in-depth study of genomes, (2) much of natural biodiversity is still unstudied, necessitating approaches such as single cell genomics (SCG) that, although still challenging when applied to algae, can sample taxa isolated directly from the environment, (3) horizontal gene transfer (HGT) in algae is no longer controversial, but rather a major contributor to the evolution of photosynthetic lineages, and its study benefits greatly from completed (or draft) genomes, and (4) epigenetics and genome evolution among populations are best studied using assembled genome data. The power of RNAseq data to support gene discovery and, in particular, gene expression differences is clearly very high and has been exploited by many groups including the large-scale Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP; Keeling et al. 2014; http://marinemicroeukaryotes.org/) and the 1,000 Plants (1KP; http://onekp.com/project.html) initiative and an array of more focused studies. These data provide significant opportunities for phycologists and are (still) the most appropriate approach for groups such as dinoflagellates that have enormous nuclear genomes that may be several times the size in humans (e.g., Hou and Lin 2009). Nonetheless, the benefits of RNAseq data can be greatly improved with the availability of reference genomes to build more accurate and complete gene models and to discriminate between nuclear “host” and contaminant genes. Whole genome analysis and annotation also provide information about gene structure (e.g., intron number and distribution; Nakamura et al. 2013), transposons (Read et al. 2013), and gene synteny (Blanc et al. 2012) that is invaluable for interpreting molecular evolution. In addition, sole use of transcriptome sequences may lead to assembly artifacts resulting in inflated gene counts and false contigs. This is partly explained by the significant coverage variation in assembled cDNA data that reflects gene expression differences across coding regions. This aspect, when compounded with sequencing errors in “over sequenced” areas can result in a poor or fragmented assembly (for discussion, see Martin and Wang 2011). In contrast, gene models predicted using complementary RNAseq and genome data generally provide a more complete and accurate inventory of nuclear genes, in particular, in identifying gene start and stop sites. Work with Porphyridium purpureum (Rhodophyta) showed that high quality RNAseq data alone identified 36,167 unique assembled contigs (N50 = 1,298nt), whereas only 8,355 genes were predicted in the genome. No obvious evidence of alternative splicing was found as an explanation for the high number of RNAseq-derived contigs in this red alga (Bhattacharya et al. 2013). Similarly, Shoguchi et al. (2013) assembled 63,104 unique RNAseq-derived contigs (N50 = 1,586nt) in a study of the coral symbiont Symbiodinium minutum (Dinophyceae), a genome that encodes ~42K genes. Therefore, genome data and the resulting gene models are the fundamental unit of study for many scientists and are used to provide templates for synthetic gene construction (Rockwell et al. 2014), to understand protein targeting (i.e., the N-terminus is needed for this purpose; Emanuelsson et al. 2000), to determine accurately gene family size (e.g., pherophorins in Volvox carteri; Prochnik et al. 2010), to have a reliable reference to map cDNA transcripts Received 22 August 2014. Accepted 10 November 2014. Author for correspondence: e-mail [email protected]. Editorial Responsibility: M. Graham (Managing Editor) J. Phycol. 51, 1–5 (2015) © 2014 Phycological Society of America DOI: 10.1111/jpy.12267
منابع مشابه
PERSPECT IVE Approaches to microRNA discovery
MicroRNAs (miRNAs) are noncoding RNAs that can regulate gene expression. Several hundred genes encoding miRNAs have been experimentally identified in animals, and many more are predicted by computational methods. How can new miRNAs be discovered and distinguished from other types of small RNA? Here we summarize current methods for identifying and validating miRNAs and discuss criteria used to d...
متن کاملEvolution of Red Algal Plastid Genomes: Ancient Architectures, Introns, Horizontal Gene Transfer, and Taxonomic Utility of Plastid Markers
Red algae have the most gene-rich plastid genomes known, but despite their evolutionary importance these genomes remain poorly sampled. Here we characterize three complete and one partial plastid genome from a diverse range of florideophytes. By unifying annotations across all available red algal plastid genomes we show they all share a highly compact and slowly-evolving architecture and unique...
متن کامل. , . ' local and Global Evaluations Attitudes as Self - Regulatory Guides for Near and Distant Responding ALISON
Although we often think of OUt attitudes and beliefs as inherent and enduring aspects of ourselves, we also find that the y fail to guide us in man y everyday social situations. At times, we act in accordance with OU f core values and ideals. Often, however, our behavior seems to be far more strongly shaped by the particularities of the Cutrent context. Building on a wea lth of past research th...
متن کاملIDEA AND PERSPECT IVE A hierarchical theory of macroecology
*Correspondence: E-mail: sophia. [email protected] Abstract The relationships of local population density (N ) with body size (M ) and distribution (D ) have been extensively studied because they reveal how ecological and historical factors structure species communities; however, a unifying model explaining their joint behaviour, has not been developed. Here, I propose a theory that explores these ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015